Goto

Collaborating Authors

 political science


Do LLMs have a Gender (Entropy) Bias?

Prabhune, Sonal, Padmanabhan, Balaji, Dutta, Kaushik

arXiv.org Artificial Intelligence

We investigate the existence and persistence of a specific type of gender bias in some of the popular LLMs and contribute a new benchmark dataset, RealWorldQuestioning (released on HuggingFace ), developed from real-world questions across four key domains in business and health contexts: education, jobs, personal financial management, and general health. We define and study entropy bias, which we define as a discrepancy in the amount of information generated by an LLM in response to real questions users have asked. We tested this using four different LLMs and evaluated the generated responses both qualitatively and quantitatively by using ChatGPT-4o (as "LLM-as-judge"). Our analyses (metric-based comparisons and "LLM-as-judge" evaluation) suggest that there is no significant bias in LLM responses for men and women at a category level. However, at a finer granularity (the individual question level), there are substantial differences in LLM responses for men and women in the majority of cases, which "cancel" each other out often due to some responses being better for males and vice versa. This is still a concern since typical users of these tools often ask a specific question (only) as opposed to several varied ones in each of these common yet important areas of life. We suggest a simple debiasing approach that iteratively merges the responses for the two genders to produce a final result. Our approach demonstrates that a simple, prompt-based debiasing strategy can effectively debias LLM outputs, thus producing responses with higher information content than both gendered variants in 78% of the cases, and consistently achieving a balanced integration in the remaining cases.


Benchmarking LLMs for Political Science: A United Nations Perspective

Liang, Yueqing, Yang, Liangwei, Wang, Chen, Xia, Congying, Meng, Rui, Xu, Xiongxiao, Wang, Haoran, Payani, Ali, Shu, Kai

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have achieved significant advances in natural language processing, yet their potential for high-stake political decision-making remains largely unexplored. This paper addresses the gap by focusing on the application of LLMs to the United Nations (UN) decision-making process, where the stakes are particularly high and political decisions can have far-reaching consequences. We introduce a novel dataset comprising publicly available UN Security Council (UNSC) records from 1994 to 2024, including draft resolutions, voting records, and diplomatic speeches. Using this dataset, we propose the United Nations Benchmark (UNBench), the first comprehensive benchmark designed to evaluate LLMs across four interconnected political science tasks: co-penholder judgment, representative voting simulation, draft adoption prediction, and representative statement generation. These tasks span the three stages of the UN decision-making process--drafting, voting, and discussing--and aim to assess LLMs' ability to understand and simulate political dynamics. Our experimental analysis demonstrates the potential and challenges of applying LLMs in this domain, providing insights into their strengths and limitations in political science. This work contributes to the growing intersection of AI and political science, opening new avenues for research and practical applications in global governance. The UNBench Repository can be accessed at: https://github.com/yueqingliang1/UNBench.


Political-LLM: Large Language Models in Political Science

Li, Lincan, Li, Jiaqi, Chen, Catherine, Gui, Fred, Yang, Hongjia, Yu, Chenxiao, Wang, Zhengguang, Cai, Jianing, Zhou, Junlong Aaron, Shen, Bolin, Qian, Alex, Chen, Weixin, Xue, Zhongkai, Sun, Lichao, He, Lifang, Chen, Hanjie, Ding, Kaize, Du, Zijian, Mu, Fangzhou, Pei, Jiaxin, Zhao, Jieyu, Swayamdipta, Swabha, Neiswanger, Willie, Wei, Hua, Hu, Xiyang, Zhu, Shixiang, Chen, Tianlong, Lu, Yingzhou, Shi, Yang, Qin, Lianhui, Fu, Tianfan, Tu, Zhengzhong, Yang, Yuzhe, Yoo, Jaemin, Zhang, Jiaheng, Rossi, Ryan, Zhan, Liang, Zhao, Liang, Ferrara, Emilio, Liu, Yan, Huang, Furong, Zhang, Xiangliang, Rothenberg, Lawrence, Ji, Shuiwang, Yu, Philip S., Zhao, Yue, Dong, Yushun

arXiv.org Artificial Intelligence

In recent years, large language models (LLMs) have been widely adopted in political science tasks such as election prediction, sentiment analysis, policy impact assessment, and misinformation detection. Meanwhile, the need to systematically understand how LLMs can further revolutionize the field also becomes urgent. In this work, we--a multidisciplinary team of researchers spanning computer science and political science--present the first principled framework termed Political-LLM to advance the comprehensive understanding of integrating LLMs into computational political science. Specifically, we first introduce a fundamental taxonomy classifying the existing explorations into two perspectives: political science and computational methodologies. In particular, from the political science perspective, we highlight the role of LLMs in automating predictive and generative tasks, simulating behavior dynamics, and improving causal inference through tools like counterfactual generation; from a computational perspective, we introduce advancements in data preparation, fine-tuning, and evaluation methods for LLMs that are tailored to political contexts. We identify key challenges and future directions, emphasizing the development of domain-specific datasets, addressing issues of bias and fairness, incorporating human expertise, and redefining evaluation criteria to align with the unique requirements of computational political science. Political-LLM seeks to serve as a guidebook for researchers to foster an informed, ethical, and impactful use of Artificial Intelligence in political science. Our online resource is available at: http://political-llm.org/. Corresponding authors: Yushun Dong (yd24f@fsu.edu) is with the Department of Computer Science, Florida State University; Yue Zhao (yzhao010@usc.edu) is with the Department of Computer Science, University of Southern California; Fred Gui (pgui@lsu.edu) is with the Department of Political Science, Louisiana State University; Catherine Chen (catherinechen@lsu.edu) is with the Manship School of Mass Communication and the Department of Political Science, Louisiana State University.


PoliPrompt: A High-Performance Cost-Effective LLM-Based Text Classification Framework for Political Science

Liu, Menglin, Shi, Ge

arXiv.org Artificial Intelligence

Recent advancements in large language models (LLMs) have opened new avenues for enhancing text classification efficiency in political science, surpassing traditional machine learning methods that often require extensive feature engineering, human labeling, and task-specific training. However, their effectiveness in achieving high classification accuracy remains questionable. This paper introduces a three-stage in-context learning approach that leverages LLMs to improve classification accuracy while minimizing experimental costs. Our method incorporates automatic enhanced prompt generation, adaptive exemplar selection, and a consensus mechanism that resolves discrepancies between two weaker LLMs, refined by an advanced LLM. We validate our approach using datasets from the BBC news reports, Kavanaugh Supreme Court confirmation, and 2018 election campaign ads. The results show significant improvements in classification F1 score (+0.36 for zero-shot classification) with manageable economic costs (-78% compared with human labeling), demonstrating that our method effectively addresses the limitations of traditional machine learning while offering a scalable and reliable solution for text analysis in political science.


Alignment Helps Make the Most of Multimodal Data

Arnold, Christian, Küpfer, Andreas

arXiv.org Artificial Intelligence

When studying political communication, combining the information from text, audio, and video signals promises to reflect the richness of human communication more comprehensively than confining it to individual modalities alone. However, its heterogeneity, connectedness, and interaction are challenging to address when modeling such multimodal data. We argue that aligning the respective modalities can be an essential step in entirely using the potential of multimodal data because it informs the model with human understanding. Taking care of the data-generating process of multimodal data, our framework proposes four principles to organize alignment and, thus, address the challenges of multimodal data. We illustrate the utility of these principles by analyzing how German MPs address members of the far-right AfD in their speeches and predicting the tone of video advertising in the context of the 2020 US presidential race. Our paper offers important insights to all keen to analyze multimodal data effectively.


Evaluating the Quality of Answers in Political Q&A Sessions with Large Language Models

Alvarez, R. Michael, Morrier, Jacob

arXiv.org Artificial Intelligence

This paper presents a new approach to evaluating the quality of answers in political question-and-answer sessions. We propose to measure an answer's quality based on the degree to which it allows us to infer the initial question accurately. This conception of answer quality inherently reflects their relevance to initial questions. Drawing parallels with semantic search, we argue that this measurement approach can be operationalized by fine-tuning a large language model on the observed corpus of questions and answers without additional labeled data. We showcase our measurement approach within the context of the Question Period in the Canadian House of Commons. Our approach yields valuable insights into the correlates of the quality of answers in the Question Period. We find that answer quality varies significantly based on the party affiliation of the members of Parliament asking the questions and uncover a meaningful correlation between answer quality and the topics of the questions.


LLMs in Political Science: Heralding a New Era of Visual Analysis

Wang, Yu

arXiv.org Artificial Intelligence

Interest is increasing among political scientists in leveraging the extensive information available in images. However, the challenge of interpreting these images lies in the need for specialized knowledge in computer vision and access to specialized hardware. As a result, image analysis has been limited to a relatively small group within the political science community. This landscape could potentially change thanks to the rise of large language models (LLMs). This paper aims to raise awareness of the feasibility of using Gemini for image content analysis. A retrospective analysis was conducted on a corpus of 688 images. Content reports were elicited from Gemini for each image and then manually evaluated by the authors. We find that Gemini is highly accurate in performing object detection, which is arguably the most common and fundamental task in image analysis for political scientists. Equally important, we show that it is easy to implement as the entire command consists of a single prompt in natural language; it is fast to run and should meet the time budget of most researchers; and it is free to use and does not require any specialized hardware. In addition, we illustrate how political scientists can leverage Gemini for other image understanding tasks, including face identification, sentiment analysis, and caption generation. Our findings suggest that Gemini and other similar LLMs have the potential to drastically stimulate and accelerate image research in political science and social sciences more broadly.



Argumentation in Waltz's "Emerging Structure of International Politics''

Wolska, Magdalena, Fröhlich, Bernd, Girgensohn, Katrin, Gholiagha, Sassan, Kiesel, Dora, Neyer, Jürgen, Riehmann, Patrick, Sienknecht, Mitja, Stein, Benno

arXiv.org Artificial Intelligence

While most prior research into the universe of political discourses is based in the genres of debate and speeches, studies of academic political discourse have been sparse. One of the goals of the project SKILL, from which this paper stems, is to fill this gap. SKILL - A social science lab for research-based learning - is dedicated to building and applying AI technologies to facilitate analysis of argumentation in scholarly articles in political science, especially in the context of teaching International Relations (IR). The ultimate goal of SKILL is to provide students with AI tools which would facilitate comprehension of original articles used as part of teaching syllabi and which would coach them in producing expert argumentation in the field. In order to gain insight into the structure and properties of arguments in the domain of political science theory, we developed an annotation scheme which enables analysis of scholarly IR discourse in terms of interaction between argumentation and types of domain content contributing to arguments. The scheme comprises two orthogonal dimensions: discourse and content domain.


How to Use Large Language Models for Text Coding: The Case of Fatherhood Roles in Public Policy Documents

Lupo, Lorenzo, Magnusson, Oscar, Hovy, Dirk, Naurin, Elin, Wängnerud, Lena

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) like GPT-3 and GPT-4 have opened up new opportunities for text analysis in political science. They promise automation with better results and less programming. In this study, we evaluate LLMs on three original coding tasks of non-English political science texts, and we provide a detailed description of a general workflow for using LLMs for text coding in political science research. Our use case offers a practical guide for researchers looking to incorporate LLMs into their research on text analysis. We find that, when provided with detailed label definitions and coding examples, an LLM can be as good as or even better than a human annotator while being much faster (up to hundreds of times), considerably cheaper (costing up to 60% less than human coding), and much easier to scale to large amounts of text. Overall, LLMs present a viable option for most text coding projects.